Toward a General I/O Arbitration Framework for netCDF Based Big Data Processing
نویسندگان
چکیده
On the verge of the convergence between high performance computing (HPC) and Big Data processing, it has become increasingly prevalent to deploy large-scale data analytics workloads on high-end supercomputers. Such applications often come in the form of complex workflows with various different components, assimilating data from scientific simulations as well as from measurements streamed from sensor networks, such as radars and satellites. For example, as part of the next generation flagship (post-K) supercomputer project of Japan, RIKEN is investigating the feasibility of a highly accurate weather forecasting system that would provide a real-time outlook for severe guerrilla rainstorms. One of the main performance bottlenecks of this application is the lack of efficient communication among workflow components, which currently takes place over the parallel file system. In this paper, we present an initial study of a direct communication framework designed for complex workflows that eliminates unnecessary file I/O among components. Specifically, we propose an I/O arbitrator layer that provides direct parallel data transfer among job components that rely on the netCDF interface for performing I/O operations, with only minimal modifications to application code. We present the design and an early evaluation of the framework on the K Computer using up to 4800 nodes running RIKEN’s experimental weather forecasting workflow as a case study.
منابع مشابه
A flexible I/O arbitration framework for netCDF-based big data processing workflows on high-end supercomputers
1College of Computer and Information Science, Southwest University of China, Chongqing, China 2State Key Laboratory for Novel Software Technology, Nanjing University, Jiangsu, China 3RIKENAdvanced Institute for Computational Science, Kobe, Japan 4Department of Electrical Engineering and Computer Science, Northwestern University, Evanston, IL, USA Correspondence Jianwei Liao, College of Computer...
متن کاملOphidia: Toward Big Data Analytics for eScience
This work introduces Ophidia, a big data analytics research effort aiming at supporting the access, analysis and mining of scientific (n-dimensional array based) data. The Ophidia platform extends, in terms of both primitives and data types, current relational database system implementations (in particular MySQL) to enable efficient data analysis tasks on scientific array-based data. To enable ...
متن کامل2016 Olympic Games on Twitter: Sentiment Analysis of Sports Fans Tweets using Big Data Framework
Big data analytics is one of the most important subjects in computer science. Today, due to the increasing expansion of Web technology, a large amount of data is available to researchers. Extracting information from these data is one of the requirements for many organizations and business centers. In recent years, the massive amount of Twitter's social networking data has become a platform for ...
متن کاملImplementing a Parallel NetCDF Interface for Seamless Remote I/O Using Multi-dimensional Data
Parallel netCDF supports parallel I/O operations for a view of data as a collection of self-describing, portable, and array-oriented objects that can be accessed through a simple interface. Its parallel I/O operations are realized with the help of an MPI-I/O library. However, such the operations are not available in remote I/O operations. So, a remote I/O mechanism of a Stampi library was intro...
متن کاملFeasibility Study of Effective Remote I/O Using a Parallel NetCDF Interface in a Long-Latency Network
NetCDF provides portable and selfdescribing I/O data format for array-oriented data in scientific computation domains. Its parallel I/O interface named parallel netCDF (hereafter PnetCDF) provides parallel I/O operations with the help of an MPI interface. To realize such operations among computers which have different MPI libraries through a PnetCDF interface, a Stampi library was introduced as...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016